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Abstract 

This work establishes the design, analysis, and fine-tuning of a Peak-to-Average-Power-Ratio (PAPR) reducing 
system, based on compressed sensing at the receiver of a peak-reducing sparse clipper applied to an OFDM signal at 
the transmitter. By exploiting the sparsity of the OFDM signal in the time domain relative to a pre-defined clipping 

\ threshold, the method depends on partially observing the frequency content of extremely simple sparse clippers 

to recover the locations, magnitudes, and phases of the clipped coefficients of the peak-reduced signal. We claim 

\ that in the absence of optimization algorithms at the transmitter that confine the frequency support of clippers to a 

predefined set of reserved-tones, no other tone-reservation method can reliably recover the original OFDM signal 

\ with such low complexity. 

Afterwards we focus on designing different clipping signals that can embed a priori information regarding the 

I support and phase of the peak-reducing signal to the receiver, followed by modified compressive sensing techniques 

for enhanced recovery. This includes data-based weighted £i minimization for enhanced support recovery and phase- 

I augmention for homogeneous clippers followed by Bayesian techniques. 

We show that using such techniques for a typical OFDM signal of 256 subcarriers and 20% reserved tones, the 

\ PAPR can be reduced by approximately 4.5 dB with a significant increase in capacity compared to a system which 

uses all its tones for data transmission and clips to such levels. The design is hence appealing from both capacity 

\ and PAPR reduction aspects. 

I Index Terms 

PAPR reduction, tone reservation techniques, compressive sensing, sparse signal estimation. 

; I. Introduction 

DESPITE the introduction of Single Carrier Frequency Division Multiple Access (SC-FDMA) into current 
multicarrier transmission standards, the success of Orthogonal Frequency Division Multiplexing (OFDM) in 
high data rate transmission remains truly remarkable, w^ith no better proof than the fact that variants of the IEEE 
802.16 and IEEE 802.18 standards are still emerging [1], [2]. 

The main problem with OFDM signalling however lies in the high temporal peaks relative to the signal mean, 
- portrayed in a parameter most commonly referred to as Peak- to- Average-Power-Ratio (PAPR)^ Since an OFDM 
signal is typically constructed by the superposition of a large number of modulated subcarriers, its envelope fluctuates 
with significant variance, causing the high PAPR. This enforces the use of expensive Power Amplifiers that should 
operate linearly over a wide range of signal amplitudes, which also dissipate a lot of energy as well [3]. 

Due to the monotonically increasing importance of OFDM signals, the problem of high PAPR has received 
considerable attention ever since OFDM was adopted in important communication standards (see [4], [5] for an 
overview). In the last decade, the problem of high PAPR in OFDM systems has been tackled by a variety of 
approaches, including coding techniques [6]-[8], selective mapping [9], [10], partial transmit sequences [11], [12], 
constellation expansion (also known as tone injection) [13]-[16], tone-reservation [17]-[19], and companding [25]- 
[27] to name a few. Although many of these reduction techniques are brilliant and very effective, the main obstacle 
limiting the implementation of most of them is commonly related to high complexity [3]. 

In this paper we design, fine-tune, implement, and analyze a novel tone-reservation based PAPR reducing system 
that makes a radically different utilization of these tones compared to previous techniques. Such a utilization could 
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^Some authors prefer using "PAR" instead for its simpler pronunciation. The fact remains however, that the problem is in the high frequency 
power amplifiers and hence the ratio of powers is the main concern in general. 
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not have been practically developed without the implementation of algorithms capable of robust reconstruction from 
partial frequency observations. Furthermore, the application we propose completely switches the stage at which sig- 
nal processing complexity is required from the transmitter's side to the receiver's side of the communication system, 
and hence provides an alternate solution to different communication models where the transmitter's complexity is 
a bottleneck. 

We wish to establish that to the best of our knowledge this is the first work in the literature where PAPR reduction 
is achieved using compressive sensing (CS) [20]. The methods throughout will always assume sparsity of clipping 
events relative to a clipping threshold, and use null tones to estimate these events, providing the first application 
of the major work of Candes and Tao on recovering sparse signals from highly incomplete frequency information 
[31] in this context. As such, we also remove the obstacle faced by all previous tone-reservation-based PAPR 
reduction techniques beginning with the pioneering work of Tellado [16], [17] till very recently [21]-[24], all of 
which required careful construction of peak-reducing signals at the transmitter in order to keep them orthogonal to 
the data signal in the frequency domain. 

Afterwards, we branch off to many solutions to enhance the basic algorithm by designing different clipping 
techniques at the transmitter, modifying the CS algorithm to make use of a priori support and phase information, 
and pursuing Bayesian Estimation techniques for joint support and amplitude estimation at the final stage. 

Unless mentioned otherwise, we use lower case letters for (column) vectors and upper case letters for matrices. 
Since we will be toggling extensively between the time domain and frequency domain, we will denote by x the 
Discrete Fourier Transform (DFT) of x, while we reserve the hat notation x to denote the estimate of x. We use 
x{i) to denote a scalar which is the i^^ coefficient of the vector x, while we reserve the subindex notation in Xi to 
denote a vector that is the i^^ column of the matrix X. Furthermore, we denote by x^ the Hermitian conjugate of 

X. ^ 

The vectors we treat throughout are complex in general and of dimension N. We denote by \\x\\p = {YjfLi 
the ^p-norm of a vector x where p could be an integer or a real number between zero and one. In the special case 
where p = the definition is modified to the pseudo-norm ||a::||o = Y^fLi Q{i)^ where q{i) = {1 if x{i) ^ 0, and 
otherwise}. 

Although we use the upper case letter F for the Fourier matrix, it will be clear from context when we also use 
it to denote the Cumulative Distribution Function (CDF) of a random variable x, ¥^{x) and Complementary CDF, 
Fx(x) = 1 - Fx(x). The Probability Density Function (PDF) will then be denoted by f^{x). We use £'[x"^] to 
denote the m}^ central moment of a random variable x. 



II. Transceiver Model 
We define the time-domain complex base-band transceiver model as 

L-l 

y(A;) = ^/i(£)x(A;-£) + z(A;), (1) 
i=o 

where {x{k)} and {y{k)} denote the channel scalar input and output, h = (/lo? ^i? • • • ? ^L-i) is the impulse 
response of the channel, z{k) ~ CA/'(0,cr^) is AWGN. In matrix form this becomes 

y = tlx + (2) 

where y and x are the time-domain OFDM receive and transmit signal blocks (after cyclic prefix removal) and 
z-CA/'(0,(j2l). 

By the cyclic prefix, H is a circulant matrix describing the cyclic convolution of the channel impulse response 
with the block x and can be decomposed into H = F^DF where F denotes a unitary Discrete Fourier Transform 
(DFT) matrix with (/c, Z) element 

[F{k, £)] = N-^/^ ^-j2M/N^ A;, £ G 0, 1, . . . , - 1 
D = diag(^), and h = y^Fh is the DFT of the channel impulse response. 
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III. Basic PAPR Reduction Design 

The time-domain OFDM signal x is typically constructed by taking the IDFT of the data vector d whose entries 
are drawn from a generic constellation. Since this signal is of high PAPR, we add a peak-reducing signal c of 
arbitrary spectral support at the transmitter and then estimate it and subtract it from the demodulated signal at the 
receiver. 

In what follows, the main condition we impose on c is that it be sparse in time. This is basically the case if we set 
a clipping threshold 7 on the envelope of the OFDM symbols, or if the transmitter were to clip the highest s peaks. 
By the incoherence property of the time-frequency bases [31], this necessarily implies that c is then dense (i.e. 
non-sparse) in the frequency domain [40] and such a condition thus cannot be satisfied in methods where the data 
and peak-reducing signal must occupy disjoint tones [17]-[19], [21]-[24]. We will denote by Ic = {i : \\c{i)\\ 7^ 0} 
the sparse temporal support of c where |Xc| = s = ||c||o. 

Throughout this work, we will only consider clipping the Nyquist rate samples of the OFDM signal. Such a 
restriction is unnecessary as it is irrelevant to the data-augmented CS methods we prescribe, but will otherwise 
require more elaborate tools such as recent findings that deal with block sparsity [29], [30], and we are forced to 
delay such topics for lack of space. With this in mind, following [57] and [59] we assume the entries of x will be 
uncorrected and that the real and imaginary parts of x are asymptotically Gaussian processes for large N. This 
directly implies that the entries of x are independent and that the envelope of x can be modeled as a sequence of 
iid Rayleigh random variables with a common CDF F|x|(|^|) and parameter a^x] which we will use extensively 
throughout. 

Denoting O as the set of frequencies in an OFDM signal of cardinality A^, let C be the set of frequencies 
that are used for data transmission and ^rn = ^\^d the complementary set reserved for measurement tones of 
cardinality \^rn\ = Note that for compressive sensing purposes, a near optimal strategy is to use a random 
assignment of tones for estimating c [32]. ^ 

The data symbols di are drawn from a QAM constellation of size M and are supported by of cardinality 
= A/^ — m = fc. Consequently, the transmitted peak-reduced time-domain signal is 

x = x + c = F^S^ J + c (3) 

where Sx is an N x k selection matrix containing only one element equal to 1 per column, and with m zero rows. 
The columns of S^; index the subcarriers that are used for data transmission in the OFDM system. Similarly, we 
denote by the x m matrix with a single element equal to 1 per column, that span the orthogonal complement 
of the columns of S^:- 

Demodulation amounts to computing the DFT 

y = Fy = F(Hx + z) 

= F{F^DF{F^S J +c) + z) 

= DS^J + DFc + i (4) 

where z = Fz has the same distribution of z since F is unitary. Assuming the channel is known at the receiver, we 
can now estimate c by projecting y onto the orthogonal complement of the signal subspace leaving us with 

y = ^mV 

= SlDFc + z 

= ^c + z. (5) 

Note that i = S^Fz is an m x 1 i.i.d Gaussian vector with a covariance matrix = cr^I^xm- 

The observation vector y is a projection of the sparse A/^-dimensional peak-reducing signal c onto a basis of 
dimension m <^ N corrupted by i. To demonstrate how such an A^-dimensional vector can be estimated from 
m linear measurements, we refer the reader to [31], [32], [37]-[39], [41]-[43], which also investigate theoretical 
bounds on m, s, and N for guaranteed recovery under various conditions. Note that in our case, the number 
of measurements m is equivalent to the number of reserved tones, while the number of clipped coefficients is 

^Based on results in [28] it was found in [20] and [45] that by using difference sets, one is able to boost the performance of the recovery 
algorithm and reduce the symbol error rate. 
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Fig. 1: Clipping and Tone Reservation 



equivalent to s, and hence the amount of clipping should be below certain bounds for reliable recovery given a 
fixed number of tones m. However, these generic CS bounds will be significantly relaxed to our advantage in the 
second part of the paper when we exploit background information from the data vector x. 

Now coming back to our problem, assume the peak reducing signal c is 5-sparse in time, given y in (5), we can 
use any compressive sensing technique at the receiver to estimate c. We will follow the main stream CS literature 
and use a convex relaxation of an otherwise NP-hard problem [39] such as 

min \\y - ^^cll^ + All clh (6) 

^^^^,11^ lip 

for recovery, where p is either 1 (for basis pursuit [36]) or 2 (for LASSO [54]) and A is a parameter for adjusting 
the sparsity penalty. The resulting solution by compressive sensing alone is an estimate Ccs of the peak reducing 
signal which not only reliably detects the positions of its nonzero entries, but also gives a good approximation to the 
corresponding amplitudes. Notice however that the estimation of c is by no means restricted to convex relaxations 
such as (6), and any compressive sensing method is valid in general, thus opening the door for many possible 
improvements in regard to complexity and efficiency. 

Fig. 1 illustrates the main points we've described so far, although caution must be taken as the actual OFDM 
signal is generally complex. 

The block diagram in Fig. 2 stresses that upon observing y, the receiver is confronted with two estimation 
problems, the first is the typical estimation of the transmitted (clipped) OFDM signal x, and the second is the 
estimation of the peak reducing signal c. Although the noise statistics are the same in both cases, the estimation 
SNR is nevertheless very different, depending on the clipping procedure. We will hence reserve the SNR notation 
for the received signal-to-noise-ratio and denote by CNR the clipper-to-noise-ratio which is defined as 



CNR 



E[\\Ekex.<k)^k 



(7) 



and hence depends on the sparsity level ||c||o = \Ic\ and the magnitudes of {c{k)}kex, which are both functions of 
the clipping threshold 7. This is the parameter of concern when it comes to compressive sensing in this paper. By 
definition, the CNR is typically less than the SNR since the energy of c leaks onto all the subcarriers even though 
the CS algorithm only has access to ^ of them, and also since the magnitudes of the nonzero coefficients of c are 
practically smaller than those of x. 

Note that in using CS our objective is to find the support Ic of the sparse signal and its complex coefficients 
{v(k)}kex, at those locations. We could hence decompose the two problems into c = ScVc and use CS for the first 
problem only, giving us S^'^^^ based on ic^^\ then refine our coefficient estimate by a more robust technique such 



Fig. 2: Block Diagram of Basic Design 



^ (cs) 

as lease squares after conditioning on the detected support. To do so we define the m x 5 matrix $ = ^^S^c and 
refine our amplitude estimate to 

y(ls\cs) ^ ^^H^yl^H^ 

in which c^^^^'^^^ = S^'^^^^c^^'^^^ follows. This dual approach is necessary in order to approach an oracle receiver that 
uses least squares (see the interesting discussion in [42]). 

IV. Comparison with Typical Tone-Reservation PAPR Reduction Techniques 

The common function of reserved tones in the literature is to act as a frequency support for the peak reducing 
signal that is disjoint from the data-carrying tones [17]-[19], [22]-[24]. In other words, for each OFDM signal 
a search is conducted for some signal c that will reduce the PAPR while being spectrally confined to a limited 
number of tones such that ||c||2 — ||S^c||2 = and hence c^d = 0. Although many different methods exist to 
find such a signal, we only mention the well-known work of Tellado's [17] for brevity, which requires solving the 
convex optimization problem 

min t 

c 

s.t. ||x + F^Scf < t (9) 

where c = Fc is nonzero only on i^om the definition of S. Clearly, this optimization approach should result in 
significantly more PAPR reduction compared to our design, since for the same number of reserved tones m, we 
can only clip s < m maximum peaks, whereas by Tellado's method no such restriction exists. 

Most importantly however, the main complexity (i.e. the stage at which the optimization search is performed) in 
these techniques is at the transmitter, since the main concern is to find c that will reduce the PAPR while occupying 
completely disjoint tones in order to remain discemable at the receiver. 

V. Enhanced PAPR Reduction by Data-Induced Weighted and Phase-Augmented £1 

Minimization 

So far we were only interested in using compressive sensing in its most abstract form as it applies to our problem. 
We assumed, following the general literature on CS, that absolutely no information is known about the locations, 
magnitudes, and phases of the sparse signal c, beyond the incomplete frequency observations which we obtained 
from the reserved tones ftc [31], [32]. In other words, the model = ^^c + i was assumed to exist independently 
of the general transceiver model y = Hx + z, even though in reality we know that c is intimately linked to x by 
the simple fact that it's superimposed on x in the time domain. 

The upshot of this section is to demonstrate that for optimal PAPR reduction using CS, the estimation of the 
clipping signal at the receiver should exploit as much information as possible in both basis representations, which 
can be achieved by weighting, constraining, or rotating the frequency-based CS search, based on information we 
infer from the data in the time domain. 

The difficulty of these problems is strongly related to the way clipping is performed. Although we have full 
control in selecting the sparsity level and the clipping magnitudes and phases to best suite our purpose, there can't 
be a clipping technique that optimizes both the support recovery and coefficient estimation, and a compromise must 
be made regarding the quality of the two. 
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A. Homogeneous Clipping Techniques 

we first begin with defining two simple clipping techniques that do not require any optimization or spectral 
confinement, and although we derive their PDFs along other properties, we focus exclusively on deterministic CS 
enhancement techniques^, and delay the matter of Bayesian compressive estimation or sensing to the following 
section. 

1) Peak Suppression to 7 (PS): Because clipping is done on the coefficients of x whose envelope exceed 7, 
the most natural construction of the clipping signal c would be to basically suppress the magnitudes of the entries 
Xi \xi\> io ^ while preserving their angles, such that + q| =7 (see Fig. 3). This is commonly expressed 
in the literature [58], [60] as 

(ry^jO^n) if \x(i)\ > 7, 
x[i) otherwise 

Obviously, the PDF of the nonzero coefficients of c^^ will depend on the PDF of |x| |x| > 7. Hence if we define 
the binary set Q to label the mutually exclusive events of clipping or not at a certain index i then 

= f\x\\\x\>^{\c^%i)\+l){^\x\{l)) 
+ 5{\<f\t)\)¥\x\{l) 

= a-^(7)(F|x|(7))/m(|c^^«l+7) 

•«(|c^'^(z)|) + F|^|(7)^(|c^'^(i)|) (11) 

where u{-) is the unit step function and 0^(7) = f\x\ {\x\) dx is a normalizing constant which depends only on 
7 and is required to ensure that f\x\\\x\>-f (kllkl > 7) d\x\ = 1. Not surprisingly, this is the most popular soft 
clipping scheme due to its simplicity and relatively low spectral distortion. 

Two features of this clipping scheme stand out in regard to CS enhancement. The first is that by suppressing all 
the data coefficients to a fixed and known threshold value 7, we could actually infer some additional information 
regarding possible clipping locations from the distance between the estimated coefficients' magnitudes and 7. This 
clipping scheme can hence provide additional information regarding the support Ic. The second feature is that the 
nonzero coefficients of c^^ are exactly anti-phased with the data coefficients at Ic^, giving us another source of 
information regarding the phases 0^ps(^x,) based on x. 

In terms of delectability from standard compressive sensing, however, the method is quite un-satisfying if left 
un-enhanced, demanding a higher number of measurements for the same sparsity level and Symbol Error Rate 
(SER) compared to other clipping techniques. The main reasons are 



^Although the LASSO estimate has a MAP interpretation [54] we don't assume any prior or statistic is used, 
"^we will call such signals homogeneous clippers since their phases are aligned with the data. 



Fig. 4: Clipping with Fixed Magnitude ( 



1) Low CNR: The CNR in PS decreases very rapidly with 7. Assuming we neglect the effect of 



keXc 



= E[\d^%k)\'].E[\\c\\o] 

roo 

= / |cf^(fc)|V(|c^^(fc)|)d|c^'^(fc)| 



= a-i(7)(2aj^|+7')e - 7 



(12) 



where the average sparsity 



TV- 



•2(F|^|(7)) -7v(F|^|(7)) 
+ 7v(F|^|(7)) 



is simply the expectation of the Binomial corresponding to the sparsity level. Notice the accumulative effect 

of 7 on E[\\cP'f]. 

2) The vanishing of |c^^^|: the random magnitudes of c^^ are drawn from the tail distributions of the data 
coefficients, making the limiting distance between the minimum penetrating coefficient and 7 approach zero. 
This is a critical bottleneck in CS that cannot be completely compensated for by increasing the CNR. Fletcher 
et al. [41] and Wainwright [42]-[44] stress this point. 
2) Digital-Magnitude Clipping (DMC): In order to avoid the problems of the previous clipping technique, 
we could increment the magnitudes of c^^ by some constant until we're satisfied with the CNR and |c^^^|. This 
however still leaves us with the burden of estimating the random magnitudes while destroying the enhanced support 
detection property of peak suppression. Instead, consider inverting the procedure from suppressing to a fixed value 
7, to suppressing by a fixed value (. ^ 

Now that {\c{k)\}kex, = C' we've decreased the degrees of freedom of c to Ic and 9c only. Furthermore, such 
a clipping scheme preserves the anti-phase property as well, thus possibly reducing the problem to that of support 
detection. ^ 

More generally, we could suppress the high peaks of x by a finite set of magnitudes {Co, C15 • • • 5 0} ^ hence 
the attribute of Digital Magnitude Clipping (or simply Digital Clipping for short), although we will only focus here 
on the binary magnitude space |c| G {0, C}- 

Following the same procedure in finding (11), and by noting the interesting relation ||c||^ = CII^Ho p = 1, 2, .., 
the PDF of the clipping signal's envelope is basically 



^ Quite expectedly, in [41] it was shown that, with no modification or reaUzation to this additional structure, a compressive estimation 
algorithm works best when all the nonzero coefficients in c are equal in magnitude. 

^In the case of digital clipping with phase augmentation, the problem can also be recast as that of detecting a point on a sparse lattice, 
and a regularized sphere decoding algorithm could be used [46]-[48]. 




(13) 



8 



The PDF of a coefficient's magnitude has been reduced to a Bernoulli random variable with probability of success 
(^|X|(7))- Furthermore, the two clipping methods PS and DMC achieve the same CNR when 

C = y«-i(7) (2'^|x|+72) e^'/2"m-7. (14) 

There is a conflicting interest in deciding the value of On one hand, the more we increase it the higher the 
CNR and the easier the support detection becomes, but on the other, the overall error of the system dramatically 
increases in case of faulty support detection. Furthermore, oversampling at the subsequent stage of transmission 
becomes more complex in this latter case. 

Nevertheless, we should at least set a lower bound on its value to ensure that all clipped coefficients will always 
end up with magnitudes equivalent to or bellow the desired clipping threshold 7, depending on the envelopes 
maximum order statistic. Afterwards, we should be very conservative in increasing ( 

B. Externally Weighted ii Minimization 

If by some prior information we have a better picture regarding the support Ic beyond the Bernoulli process 
assumption, we can modify the LASSO in (6) by penalizing disfavored locations so that 

c = argmin \\y — ^^c||2 + AUtL^-^cHi, (15) 

where it; is a weighting vector imposed on the ii penalty term based on this prior information. In the literature, the 
source of w is from previous runs of the CS algorithm itself [34] [55], where the hope is that with each iteration 
more confidence will exist in Xc^^^^ based on, for instance [34], 



w{iy^^'^ (X [\c{iy^^ l+ej i = l,2,...,7V (16) 

where e > is a small stabilizing parameter. We will refer to this procedure as internally weighted ii minimization. 

Repeating the CS algorithm is computationally expensive, and the process is sensitive to the quality of the first 
unguided CS estimate. Instead, we would rather use a one-shot weighting scheme that minimally increases the 
complexity of an ordinary LASSO. Fortunately, this could be done if we had an external source of information 
based on the data vector x. 

Recall the discussion in V-Al regarding embedded information on the support Xc in peak suppression. The idea 
is that we expect the coefficients of x whose magnitudes are close to 7 to be more probable clipping locations 
compared to ones that are not. Consequently, we can define a weighting vector w^^ based on the distance 

c?(i) = ||l(i)|-7|, i = l,2,...,7V (17) 

and use it in (15). Another data-based weighting scheme would be the posterior probability of not having a clip 
{q = 0) given the observation (17), such that less likely clipping locations are more severely penalized by having 
a higher such posterior probability 

w{iy' = Pr(g = 0|d(z)) (18) 
Pr(6/(i)|g = 0)Pr(g = 0) 
EqeQPr{d{i)\q)Fr{q) 

/|^|(7-d(i))F|^|(7) 
(7 - d{i))¥\x\ (7) + f\E\ {d^W\x\ (7) 

where /|^| is the density function corresponding to the estimation error of the data envelope \x\{i), which is the sole 
reason d{i) >0 when conditioned on clipping x{i). Using least squares to recover x{i), we assume its error to be 
Gaussian and hence f\E\ and /|^| = f\x^E\ ^ be Rayleigh with parameters a\E\ and cr\x^E\ = ['^~^i^x + ^e)] 
respectively. Defining 77(7) = 1 — e"^^/^^, this becomes 



^(ir = . /, . A (19) 
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where 

^_ 277(7)(7-d(i)) , (7-rf(i))' 
, _ 2(1 - 7?(7))d(i) . _ d{i)' 

The second part of (19) is a necessary manipulation for numerical stability. 

Notice also that what helps in suppressing only to 7 here is that we have a probabilistic means to cast out most 
of the possible false positives. Had we suppressed the magnitudes to the envelope mean for instance, 
the procedure above would favor many locations as clipping positions by the fact that \x\ — E[\x{i)\] is small. 
Nonetheless, misleading bias to certain locations as candidates for clipping positions due to their coefficient's 
natural proximity to 7 can never be completely eliminated, even at infinite CNR. 



C. Phase -Augmented CS for Homogenous Clippers 

In the case of homogenous clipping, 9c(Ic) = ^xi^c) at the transmitter, and consequently the CS algorithm should 
have access to additional information regarding the phases of the nonzero coefficients. The problem however is that 
we only have an estimate 0^{lc) at the receiver, and the extent to which CS can benefit from this property depends 
on how good the estimate x is in general. To this end, we will only consider the SNR as the parameter to which 
we judge the quality of the data estimate. 

Recall the discussion following Fig. 2 regarding the CNR and SNR, and consider the effect of gradually increasing 
( which we defined in V-A2. Notice that when C = 0, the 7-penetrating coefficient attains its maximum SNR, then 
as we increase ( the CNR increases as ('^E [\\c\\o] while the SNR decreases by ( {2E[\x\] — Q. Consequently, the 
CNR will be larger than the SNR in the locations where {E [\\c\\o] - 1) + [\x\] (- E > 0. Fortunately 
practical values of ( relative to ^ fall outside this region, and we would normally expect to gain information 
regarding 9c from x that is more reliable than information from CS alone. 

This fact encourages us to absorb, and perhaps even replace altogether, as much information as possible regarding 
9c from the estimated data vector x. Assume first that we know the vector 9c, we could then merge this information 
into the CS algorithm by expressing the clipping signal as c = 9c|c| such that 



c = 











" 




'\c{l)\- 





gi^c(2) 










H2)\ 


























gi^c(N) 




.\cm. 



(20) 

which could be directly fused into the measurement matrix thus transforming our model from ?/ = *c + i to 
y = ^^9c|c| + z where 



has now realigned the phases of the coefficients sought and reduced the problem to estimating a real sparse vector, 
with only the locations and magnitudes of the nonzero coefficients of c to be found. In the case of digital clipping, 
we can then force the magnitudes to the nearest alphabets as well. In any case, with 6c unknown prior to CS, we 
will instead use 6^ — 27tInxN to augment the CS algorithm. This could be done in two ways: 

1) Sense then Rotate (StR): Use the standard CS or weighted CS algorithms used so far to regain c^^^^ = 
aig^^QN mm{\\y — *c||2 + A||c||i} where PA stands for Phase Augmentation, extract the locations and 
magnitudes of the nonzero coefficients from c, and then rotate them according to the corresponding estimated 
directions in x. i.e. 



ieic 



(21) 
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2) Rotate then Sense (RtS): In this case supply the CS algorithm with the phase information from x as described 
above. This rotation prior to compressive sensing recasts the problem as an estimation of a real vector with 
2m real observations. Defining = ^©c. we're left with the following model 



y 



for which we use the following program to recover c 



(22) 



iRtS 



arg|c|eR 



Af mm 



{I 



y 



li + A||c|| 



(23) 



where ^^^^ (9^ — 27rlArxAr)- Notice that, similar to (21) one could also replace the phases of c^^^ with 
|gj(^^(i)-27r) I ^^^^j, ^23) but we have not observed any significant improvement in doing so. 

VI. Bayesian Estimation of Sparse Clipping Signals 



To take into account the statistical information at hand, we could simply modify the dual stage estimate in (8) to 
a linear minimum mean-square (LMMSE) estimate of the amplitudes conditioned on the support estimate 2^* 



This should clearly improve upon the least square estimate (8) in case the distribution of Vc is Gaussian, but will 
not be able to invoke any statistical information into the support estimate. Using a Maximum a Posteriori (MAP) 
estimate c = argmaxP(y|c)P(c) generally leads to non-convex optimization problems in sparse models, and we 
refer instead to an MMSE estimate. First define j'-^' as the Hamming vector of length N and Hamming weight |X| 
with active coefficients according to the support set X. Then marginalizing on all such possible vectors we obtain 

-MMSE 



= ^1 

2" 



= 5^E[c|y,Ji]P(y|Ji)P(Ji) 



(24) 



= 1 



with dropping off P{y) in (24) due to its independence of i. The estimate is a weighted sum of conditional 
expectations, and the formal (exact) approach requires computing 2^ terms which is a formidable task for large N . 
To limit the search space, the key is to truncate the summation index to a much smaller subset of support vectors 
J*. As such, the weights {P{J]^\y))]^^j^ will not sum up to unity, and we will need to mitigate this by normalizing 
the truncated weighted sum by the sum of weights W = Z1/cgj* P{y\Jk)P{Jk)^ hence reducing (24) to 

^ J2E[c\y,Jk]P{y\Jk)P{Jk)- (25) 



^MMSE 



ker 



In effect, estimating c in an MMSE criterion boils down to appropriately selecting J* and evaluating the terms 
P{Jk), P{y\Jk), and E[c\ y^Jk] ^Jk ^ J*, which are in increasing complexity in the order we've just mentioned. 

When using peak suppression to 7, the receiver is given a vague picture of where clipping has occurred based 
on the affinity of x to 7. Consequently, by sorting the magnitudes of the weighting vector in (17) in ascending 
order, the probability of the true support coinciding with the first /? elements in argjit;^} will increase rapidly with 
13. Fig. 5 shows a Monte Carlo simulation of this probability at different clipping thresholds. For instance, this 
implies that given a clipping threshold of 7 = 2(j|x|, one could exclude 70% of the N indices as having too low a 
probability of corresponding to a clipping location, thus reducing the possible candidates from 2^ to 2^ Hamming 
vectors. 

Given this reduced set J^^'^-^} of vectors, we adopt a search over it by latching a vector of unity Hamming 
weight based on (25), and then proceed in a greedy fashion similar to Larsson [49] and Schniter [50], [51] until a 
maximum sparsity level s^^^ is reached. This will preserve the quality of the greedy estimate using Fast Bayesian 
Matching Pursuit (FBMP) in [50] while reducing the number of executions of (25) by 

( ^ /3(1 + P • 5"^^^^ - ^— — ^ — ^ ^ 



100 



where p is the number of tested candidates for each Hamming weight. This would correspond to a reduction of 
60 — 80% of executions with our practical parameters, and we will henceforth refer to this procedure as /3-FBMP. 
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Fig. 5: Probability of support index set Ic being completely included in the first /3/N% of argjit;^} 




Fig. 6: SER of PS vs 7 



VII. Performance Analysis and Simulations 

For our simulation purposes we considered an OFDM signal of = 256 subcarriers of which m = 0.2A^ 
are randomly dispersed measurement tones. The data coefficients were generated from a QAM constellation of 
size M = 32. The Rayleigh fading channel model was of 32 taps, operating at a 30 dB SNR environment. The 
performance parameters we considered were the SER, the relative temporal complexity, the PAPR reduction ability, 
and the capacity. 

Our primary objective was to test the SER variation with the clipping threshold 7 for a clipped OFDM signal 
that used our different adaptations of CS algorithms and clipping techniques. Observed as a variable, the clipping 
threshold in particular is of central importance due to its critical effect on both CS generic performance and the 
PAPR reduction. Decreasing 7 significantly reduces the PAPR but also implies a nonlinear increase in the average 
sparsity level that the estimation algorithms must tolerate. It also has a positive counter effect on CS performance 
as well since it increases the CNR, making the overall behavior of SER(7) difficult to predict. 

Furthermore, when testing the precise performance of an algorithm we used the Normalized Mean Square Error 

\c-cr 



NMSE = E 



to ensure that error decrease was not simply due to a decrease in the number of estimated variables. 

Fig. 6 shows the SER for Peak Suppressing clippers in V-Al after QAM decoding (FSa;)1^(x^^ + c^^^)) as 
the clipping threshold is varied. The methods tested were the reduced search space greedy method (/3-FBMP), the 
LASSO, the Phase-Augmented LASSO (PAL) using (23), the data-based Weighted LASSO (WL), and the Weighted 
Phase-Augmented LASSO (WPAL). These were compared against two performance bounds: the lower bound of not 
estimating c, and the upper bound of an oracle receiver that knows the support and simply uses least squares to 
estimate the coefficients' amplitudes. Interestingly, combining the support and phase augmentation techniques into 
the LASSO enables it to perform very close to the support oracle, and even beat it at low clipping thresholds where 
s > 0.55 m since it has additional information regarding the coefficients' phases. Furthermore, weighting alone is 
more effective then phase-augmentation, although both significantly improve the performance of the LASSO. 
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Fig. 7: NMSE of Digital Clipper estimate as a function of the coefficient magnitude ( 
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Fig. 8: NMSE of Digital Clipper Estimate as a function of the clipping threshold 7 



To see the effect of varying the magnitude of active coefficients in digital clipping of section V-A2 we plotted 
the NMSE vs ( in Fig. 7. This avoids a biased evaluation due to increased CNR with (. The results imply that 
embedding the phase information into the LASSO in (23) is much more effective than rotating the estimate after 
compressed sensing in (21). It also shows that the former method is considerably close to a phase oracle that uses 
the same technique for practical values of ( relative to a\x\- However, as expected they eventually deviate as we 
increase ( since this corresponds to decreasing the SNR and hence the accuracy of the phase information induced 
from the data vector estimate 9^. Fig. 8 implies that forcing the magnitudes of the estimates in (21) and (23) is 
generally ineffective except in the very sparse cases for the former. The overall result on the SER is portrayed in 
Fig. 9 at a fixed ( = 0.8a\x\' 

Complexity-wise, we neglect mentioning implementation and orders of complexity since they match those of 
standard algorithms we've built on and that are well documented in the CS literature (e.g. [39], [50], [53]). Instead 
we investigate the practical aspect of the relative time required to execute the major techniques proposed in the 




Fig. 9: SER of Digital Clipping with ( = 0.8cr|x| vs 7 
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Fig. 10: CCDF of execution time normalized by maximum value 



paper compared to Tellado's primary tone-reservation algorithm using the same generic CVX software [52].^ As 
such we collected the random execution times for 2000 runs of each, normalized them by the maximum execution 
time among all, and plotted their CCDF. Fig. 10 depicts the results. Roughly speaking, the methods stemming from 
the LASSO required less then 12% of the time required to execute Tellado's primary QCQP algorithm on average, 
while the /3-FBMP required less than 2% of the time. 

A major advantage of clipping to a fixed threshold is that, unlike tone-reservation methods such as [17], [22] 
the dynamic range, maximum power, and PAPR of the transmitted signal are fixed. The distribution of PAPR 
reduction, 10 log ("^2^)^ would simply follow from the distribution of the maximum squared coefficient in x 
(refer to [57]-[59] for relevant analysis) which we plot in Fig. 11. The fixed maximum power followed from the 
clipping threshold that corresponded to a SER of 10~^ for the different techniques in this work. 



TABLE I: Summary of Results 





Tolerable 7 


Avg. PAPR Red. (dB) 


% Exec. Time 


DC (RtS) 


2.40 -(j\x\ 


3.19 


11.06% 


/3-FBMP 


2.26 ■(T\x\ 


3.71 


1.6% 


LASSO 


2.25 'CF\x\ 


3.75 


12.3% 


WPAL 


2.02 •cj\x\ 


4.68 


13.9% 


Tellado 




4.37 


100% 



The most fundamental parameter of interest given a desired clipping threshold is the channel capacity [17], [60] 



N / 
C=^l0g2 1 + 

k=l \ 



and we will thus consider two systems. The first system Si clips all coefficients above 7 and does not reserve tones 
to estimate the clipping signal c, resulting in a higher clipping noise over all N tones while retaining all of them 
for data transmission. The second system ^2 reserves m tones to estimate c, thus reducing the SER degradation 
while also reducing the data tones by m. 

The justification then depends very much on the variances of the clipping noise {cr^(A:; ^)}k^^^ with and without 
estimation at the receiver. Furthermore, if the threshold 7 is sufficiently low relative to cr\x\ (^-g- E [||c||o ; 7] = 10% 
of N), the clipping noise on each tone will be the result of a reasonably large summation of scaled coefficients of 
c in the time domain, and so will the distribution of the priors in (11) converge to a Gaussian. With this theoretical 
justification aided by extensive simulations, we will assume for simplicity that the distortion on each carrier follows 
a Gaussian with a common variance a^. However, caution must be taken when comparing this parameter for the 
two systems. The reason is that Si has more data energy than ^2 by using all N tones, and will thus have a higher 
distortion variance at the same clipping level 7, i.e. > ^l\\n^\=N-m' Consequently, the capacity of the 



^With the only exception being Schniter's Greedy algorithm when evaluating /3-FBMP. 
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Fig. 11: CCDF of PAPR Reduction (dB) 




Fig. 12: Capacity per transmitted tone at different clipping thresholds 



first system (after dropping the tone index) will be 

Ci = A^log2 1^1- 
while the capacity of the second will be 

C2 = {N -m)logJl 



\^\'^^l\\nd\=N 



lD|2n-2 

1^1 ^x\\Qd\=N-m 



\c-c)\\nd\=N-m 

The use of reserved tones for CS is then justified if C2 > Ci, i.e. when 

^2 



^{c-c)\\na\=N-m < 



x\\Qd\=N-m 



1 + 



\\^d\=^ 



N 
N — m 



(26) 



(27) 



(28) 



It would be very interesting to observe how this parameter behaves as a function of the clipping threshold 7 as 
both the distortion and the quality of the estimate nonlinearly counteract each other. Fig. 12 shows such 
results upon 1000 runs at each 7 for estimating and cr'^c-c)' results show that by reserving 20% of the tones 
for data-based weighted and phase-augmented LASSO the capacity of such a system can significantly outperform 
the naive system which uses all the tones for data transmission. What's more, the capacity associated with this 
technique behaves in a convex fashion so that by reducing the capacity by less then 1 bit per second per transmitted 
tone, the clipping threshold can be dramatically reduced from 7 = 2.5a^x\ to 7 = 2(j|x|. Unlike the semi-linear 
relation of Si with 7, such behavior offers a very tempting compromise between capacity and peak-reduction. 
Using the typical LASSO at such conditions is effective at clipping thresholds reaching as low as 1.9a^x\ which 
is impressive. 

Fig. 13 implies that increasing the SNR is much more rewarding for ^2 compared to Si which we test at a fixed 
clipping threshold of 2.3a\x\' The reason is that eliminating has no effect on cr^ and the capacity of the naive 
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Fig. 13: Capacity per transmitted tone vs SNR 

system saturates after an SNR of 35 dB. On the other hand, decreasing the noise level improves the CS estimate 
and hence has a dual effect in increasing the capacity, leading to the semi-linear relation with the SNR. 

VIII. Conclusion 

In this work we have established the new general concept of clipping mitigation (and hence PAPR reduction) in 
OFDM using compressive sensing techniques. The general framework stressed the use of reserved subcarriers to 
compressively estimate the locations and amplitudes of the clipped portions of a transmitted OFDM signal at the 
receiver, instead of using them at the transmitter as a spectral support for optimized peak reducing signals in the 
time domain. Consequently, the method interchanges the stage at which signal processing complexity is required 
compared to the previous techniques, hence introducing a real solution to communication systems that use OFDM 
signals at the physical layer and require minimal complexity at the transmitter. 

The other major contribution is demonstrating how by a marginal increase in complexity one can augment the 
standard ii minimization of CS by extracting information regarding clipping locations, magnitudes, and phases 
from the data, and hence enable the system to estimate sparse clippers far beyond the recoverability conditions 
of CS (e.g. sparsity levels above 55% of m). Such augmentation was shown to significantly boost the overall 
system's capacity at low clipping thresholds and thus suggests a very appealing compromise between capacity and 
peak-reduction. 
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